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5-H ' Abstract 

.^ , The problem of mobile sequential recommendation is presented to suggest a route connecting 

some pick-up points for a taxi driver so that he/she is more likely to get passengers with less 
travel cost. Essentially, a key challenge of this problem is its high computational complexity. 
In this paper, we propose a dynamical programming based method to solve this problem. Our 
C/^ , method consists of two separate stages: an offline pre-processing stage and an online search 

^^ ' stage. The offline stage pre-computes optimal potential sequence candidates from a set of pick- 

^ , up points, and the online stage selects the optimal driving route based on the pre-computcd sc- 

Q ' quences with the current position of an empty taxi. Specifically, for the offline pre-computation, 

a backward incremental sequence generation algorithm is proposed based on the iterative prop- 
erty of the cost function. Simultaneously, an incremental pruning policy is adopted in the 
^ , process of sequence generation to reduce the search space of the potential sequences effectively. 

■^ ' In addition, a batch pruning algorithm can also be applied to the generated potential sequences 

'^ , to remove the non-optimal ones of a certain length. Since the pruning effect continuously in- 

-^.j ' creases with the increase of the sequence length, our method can search the optimal driving 

• , route efficiently in the remaining potential sequence candidates. Experimental results on real 

^ ' and synthetic data sets show that the pruning percentage of our method is significantly improved 

Z^ , compared to the state-of-the-art methods, which makes our method can be used to handle the 

problem of mobile sequential recommendation with more pick-up points and to search the op- 
timal driving routes in arbitrary length ranges. 

Key words: Mobile Sequential Recommendation, Potential Travel Distance, Backward Path 
J^ , Growth, Sequence Pruning. 

1 Introduction 

With the wide utilization of the sensor, wireless communication and information infrastructures 
such as GPRS, WiFi and RFID, we can easily access the location trace data for a large number of 
moving objects. Finding useful knowledge from these trajectory data will provide strong support for 
the real-time decision and the intelligence services in the related applications [T] . Reducing taxicab 
cruising cost problem is a typical example [2l[3]. An unloaded taxi driving on the road not only leads 
to waste of fuel and time, but also may result in traffic congestion. However, some high probability 
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pick-up points in the taxi trajectory data of the high-yield drivers can be excavated to guide new 
drivers to pick up passengers in a more economical and efficient way. Therefore, high-efficiency 
mobile pattern mining and recommendation algorithm can improve business performance of the 
drivers and reduce the energy consumption. This is a problem possessing considerable theoretical 
significance and applicable values [31 H] . 

In [2], Ge et al. have proposed a novel problem of Mobile Sequential Recommendation (MSR), 
which is to suggest a route connecting some pick-up points for an empty cab so that the driver 
is more likely to get passengers with less travel cost starting from its current position. It is a 
challenging task, because we need to enumerate and compare all possible routes derived from the 
given set of pick-up points which involves a rather high computational complexity. To solve the 
MSR problem, they provided a function of Potential Travel Distance (PTD) for evaluating the 
cost of a driving route. Essentially, the PTD value of a suggested route is the expected travel 
distance for an empty cab before it successfully gets new passengers when it travels along the 
route. To reduce the computational cost, two effective potential sequence pruning algorithms LCP 
and SkyRoute, which are based on the monotone property of the PTD function, have been proposed 
in [2]. However, the time and space complexities of these two algorithms both grow exponentially 
with the number of pick-up points and the length of the suggested driving route, so they can only 
perform the driving route recommendation with a length constraint in a small number of pick-up 
points. 

However, in real applications, a driver always wants to obtain the optimal driving routes in a 
range of length, so that he/she can select a preferable driving route among them. In this paper, 
we consider a generalized mobile sequential recommendation problem with minimal and maximal 
length constraints. We propose a solution including an offline stage and an online stage. The offline 
stage effectively prunes the search space and generates a small set of sequence candidates. The 
online stage is for obtaining the optimal driving route given the current position of an unloaded 
taxi as the starting point. Specifically, for the offline pre-computation, we have deeply studied the 
nature of the PTD function and have found that it satisfies the iterative calculation feature. This 
feature allows us to incrementally construct a potential driving route backward from the terminal 
point to the starting point. Based on the above calculation feature of the PTD function, we have 
also found that a set of potential sequences with the same length and the same starting point 
satisfies the incremental and batch pruning properties. Then, we design a novel mobile sequential 
recommendation method which takes full advantage of the iterative nature of the PTD function. 
It incrementally generates potential sequences and removes a lot of impossible search space in the 
process which greatly enhances the time efficiency and reduces the memory consumption. Among 
the generated potential sequences with the same length, we can still remove a large number of 
potential sequences which cannot form the optimal route by using a batch pruning policy. It 
can dramatically reduce the number of the remaining sequence candidates. Experimental results 
show that the offiine pruning effect and the online search efficiency of our method are significantly 
improved compared to the existing state-of-the-art methods. 

The main contributions of the paper are given as follows: 1) Our algorithm can generate all 
possible sequence candidates of arbitrary length which can be used to suggest the driving route 
with any length range constraint; 2) The recursive formula of the PTD function is presented which 
makes the incremental generation of the potential sequences possible; 3) A backward incremental 
sequence generation algorithm with less time and a smaller space complexity is proposed; 4) An 
efficient method for comparing the PTD cost of different potential sequences and driving routes 



Table 1: Adopted symbols. 



Symbols 


Definition 


C 


A set of potential pick-up points. 


Ci 


A location point. It represents the current location of the cab when i = and a pick-up 
point in C with i > 0. 


N 


The number of pick-up points in C 


Pic.) 


The probability of successfully taking passenger at Ci. 


D 


The distance matrix of pairs of location points. 


D.„., 


The distance from Ci to Cj. 


r 


The potential mobile sequence containing one or more different pick-up points. 


Ik1l 


The length of potential sequence r (i.e., the number of different pick-up points in r). 


Cr 


The set of pick-up points in the potential sequence r. 


Pr 


The probability vector of the pick-up points consisting of the potential sequence r . 


1^1 


The number of elements in the set A. 


(Cr) 


A driving route that travels the potential sequence r starting from the location point c. 


s{r) 


The source point of the potential sequence r . 


H 


A set of all potential sequences. 


7? 


A set of potential sequences with length L. 


-Tt^ 


A set of potential sequences that have the same length, source point and pick-up point set 
as f. 


^ 


A set of potential sequences with length L and source point c. 



is presented; 5) An effective sequence pruning method combining incremental pruning and batch 
pruning is adopted which significantly improves the offline pruning effect. 

The rest of the paper is organized as follows. Section 2 introduces the background and the 
related work. Section 3 gives the iterative nature of the PTD function and the proposed sequence 
pruning principle. In Section 4, the offline sequence generation and online search algorithms are 
described in detail. Section 5 gives the experimental results and analysis. Section 6 discusses some 
extension of our method. Finally, section 7 concludes the paper. 

2 BACKGROUND 

In this section, we first introduce the MSR problem and then describe the previous works. 



2.1 Problem Statement 

Let Ci be a potential pick-up position and C = {ci, C2, ..., cat} be a set of N pick-up points. The 
probability that a taxi can successfully carry passengers at the pick-up point Cj is denoted by P(cj), 
and the set of mutually independence probability is P = {P(ci), P(c2), ...,P{cn)}- Which driving 
route will lead to the minimum cost of picking up a new passenger when a taxi travels all or part 
of the pick-up points in C starting from its current location? This is the MSR problem introduced 
by Ge et al. [2]. The problem can also be found in other scenarios such as recommending tourist 
routes, searching parking places, etc. In the following, we introduce some concepts of the MSR 
problem and all the symbols used in this paper are described in Tabled) 

Let f = (ci,C2, • • • ,cl) be a potential sequence with length L derived from the pick-up points 



^Ci)=0.5 





^C2)=0.3 

Figure 1: An example of the driving route and its PTD cost. 

set C, where each q in r is different from each other, ci is called the source point of r and cl is 

called the destination point. Cf = {ci, C2, • • • , cl} denotes the pick-up points set of the potential 

sequence r. it = {r|C^ C C A C^ 7^ 0} is the set of all potential sequences derived from C. 

R = M is the number of all possible potential sequences in R. P{r) = (P(ci), P{c2), ■ ■ ■ , P{cl)) 

is the probability vector of the potential sequence f consisting of the probabilities of all the pick-up 
points in r . d = (cq, r) is a driving route, where Cq is the current location of a taxi, f is the sequence 



of pick-up points, and 



k is the length of d. 



For a driving route d = {cQ,r), a PTD function is defined in [2] to evaluate its travel cost. Let 

D{d) = { Dcf,.ci, {Dc(,,ci + Dci.cz), ■ ■ ■ ,J2 Dc,_i,c,,Doc ) be the distance vector of d and probability 

L-l L 



vector P{d) = ( P(ci),P(ci) • P(c2), . . . , Jl P{c^) ■ P{cl),U Pi^,) ), then the PTD cost of d can be 

\ i=l 4=1 / 

calculated by 

F{d) = F(co, f, P(f)) = D{d) ■ P{d), (1) 

where Dqo represents the desired maximum cruising distance of a driver for picking up new passen- 
gers, and it can be manually specified. The PTD value of a driving route d represents the expected 
travel distance of an empty cab for picking up new passengers when it is driving along the route. 
The smaller the PTD cost of a driving route, the shorter travel distance and the less required energy 
and cost for the cab to take new guests driving along it. 

The objective of the simple MSR problem is to recommend a driving route derived from the set 
of pick-up points C for a cab driver, so that the expected potential travel distance (PTD) for finding 
new passengers is minimal. An illustration example is shown in Figure [H there are two different 
driving routes di = (co,ci,C2) and d2 = (cq, 02,03) with length 2. Let Doo = 10. We can get that 



D{di) = {D,,^c„ iDc„c, + D,,^,,),Doo) = (2, 7, 10), P(di) = ^P(ci), P{ci) ■ P(c2), P(ci) • P(c2; 
(0.5,0.15,0.35), D{d2) = {Dc,,c2,{Dco,c, + Dc,,c,),Doo) = (4,5,10), and 



P{d2) = (P{c2),P{c2) ■ -P(c3), P{c2) ■ ^(ca) )> = (0.3, 0.56, 0.14) . So the PTD cost of di is F{di) = 

2x0.5+7x0.15+10x0.35 = 5.55 and the PTD cost of ^2 is ^(^2) = 4x0.3+5x0.56+10x0.14 = 5.4. 
We can see that the PTD cost of d2 is smaller than that of di and then d2 should be recommended. 

Since the computational complexity of the simple MSR problem is 0{N\) [2], a brute-force 



method for searching the optimal route in n is inefficient. In [2], Ge et al. focus on the MSR 
problem with a length constraint due to the high complexity of the simple MSR problem. 

However, in real life, a user usually prefers to request a route within a length range. For example, 
a cab driver wants to get an optimal driving route in the nearby area with length between 3 and 5. 
In this case, a recommendation method with a length constraint will be inefficient in handling such 
a service request while a recommendation method with unconstraint simple MSR problem is also 
inefficient when the suggested length is less than 3 or more than 5. Therefore, we focus on a more 
general MSR problem in this paper with length between the minimum L^j„ and the maximum 
Lmax- The generalized MSR problem is given as follows. 



The generalized MSR problem 




Given: 




A set of potential pick-up points C = {ci, C2, ... 


Cat}; 


A probability set P = {p(qi),P{c2), ...,P{cn)} 




A potential sequence set H = {ri,f2, ...,rM}; 




The position co of a cab who needs the service; 




The suggested minimal length Lmin £ {1,2,..., 


N}; 


The suggested maximal length Lmax £ {Lmin, . ■ 


.,N}. 


Objective: Recommending an optimal driving 


route 


d =< co,r >, s.t. 




minF(co, r,P(r)), 
re 7! 




where f £ R and Lmin < d < Lmax ■ 





Actually, the above problem is a computational extension of the simple MSR problem with 
more flexible parameter specification. When we set L^[^ = 1 and L^ax = -^, it is the simple 
MSR problem. Moreover, the length constrained MSR problem can be obtained by setting L^[^ = 
Lmax = -^ [2]. In this paper, we present a method to handle any cases of 1 < L^m < -^max < ^- In 
particular, in order to compare the cost of potential sequences in various lengths, we set the D^q 
to be equal for all the suggested routes of arbitrary length. 



to Lr, 



IS 



Since the number of driving routes satisfying the length constraint from L 

Lmax / N \ 

^ I ] ■ L\, the computational complexity of the generalized MSR problem is no more 

L = Lmin V / 

than the complexity of the simple MSR problem 0{N]) and is no less than the complexity of the 
MSR problem with fixed length Lmax- Therefore, it cannot be effectively solved by the brute- force 
search method. 



2.2 Related Work 

In recent years, intelligent transportation systems and trajectory data mining have aroused widespread 
attentions [H El [HI [9] . Mobile navigation and route recommendation have become a hot topic in 
this research field [2l[IIl[l2l[l3l[Ill[l5l[2ll[ini[22l[23]. 

The MSR problem presented by Ge et al. in [2] is rather different from the traditional problems 
such as Shortest-Path problem [161 117] . Traveling- Salesman problem [18] and Vehicle-Scheduling 
problem [19]. Because for the shortest path computation problem, the source and destination nodes 
of an object are known in advance. However, for MSR problem, both of them are unknown. The 
traditional Traveling- Salesman Problem (TSP) gets a shortest path that includes all N locations 
while MSR problem is to find a path that consists of a subset of given N locations. In addition. 



the traditional Vehicle-Scheduling problem needs to determine a set of duties in advance while the 
pick-up routes (jobs) among several locations is uncertain for the MSR problem. 

In [2], the authors focus on the MSR problem with a length constraint due to the high com- 
putational complexity of the unconstraint simple MSR problem. To reduce the search space, they 
proposed a route dominance based sequence pruning algorithm LCP. However, the proposed algo- 
rithm has difficulty in handling the problem with a large number of pick-up points. A novel skyline 
based algorithm SkyRoute is also introduced for searching the optimal route which can service 
multiple cabs online. However, the skyline query is inefficient in handling , since it is processed 
online. 

Yuan et al. proposed a probability model for detecting pick-up points 0]. It finds a route with 
the biggest pick-up probability to the parking position constrained by a distance threshold instead 
of the minimal cost of the route and provides location recommendation service both for the cab 
drivers and for the people needing the taxi services. In contrast, the problem solved in [21^ 122] is 
different from the MSR problem which is to recommend a fastest route to a destination place with 
starting position and time constraints. 

Powell et al. proposed a grid-based approach to suggest profit locations for taxi drivers by 
constructing a spatio-temporal profitability map, on which, the nearby regions of the driver are 
scored according to the potential profit calculated by the historical data. However, this method 
only finds a parking place with the biggest profit in a local scope instead of a set of pick-up points 
with overall consideration. 

Lu et al. [ll] introduced a problem of finding optimal trip route with time constraint. They 
also proposed an efficient trip planning method considering the current position of a user. However, 
their method uses the score of attractions to measure the preference of a route. 

3 PROPOSED METHOD 

To address the computational challenge of the generalized MSR problem, we first identify the 
iterative property of the PTD function, which makes the incremental generation of the potential 
sequences possible and then propose the pruning principle, which uses the iterative property to 
efficiently reduce the search space. 

3.1 The Iterative Property Of The PTD Function 

As described in section 2, the PTD function gives a computable measure for the cost of a route. In 
the following, we study the property of the PTD function. 

Actually, an iterative computational formula of the PTD function j5] can be obtained without 
considering the driving distance beyond the last pick-up point of a driving route. For this purpose, 
we introduce the concept of PTD sub-function. 

Definition 1. (PTD Sub-function Fl) Given a driving route d = {cq,ci,C2, ■ ■ ■ ,cl), cq is its 
starting point and f = (ci, C2, . . . , cl) is its pick-up sequence. Let the distance sub-vector of D[d) 
be 

D{d) = I i?eo,ci, (i?co,ci + D,,^,,), . . . , E ^c._i,c, 



and the probability sub-vector of P{d) be 



L-l. 



P{d) = ( P(ci), P(ci) • P{C2), . . . , n Pic^) ■ P{cl) 

The PTD sub-function Fl of the driving route d is defined as 

Fl{d) = D{d)-P{d). (2) 

Compared to the distance vector D[d) and probability vector P{d), the sub- vectors D{d) and 

P{d) of a driving route d = {cQ,r) only lack the last component respectively. Therefore, the PTD 
cost of a driving route d can be expressed using its PTD subfunction by the following equation 

L 

F{d) = Fl{d) + D^-'[[P{^. (3) 

In fact, we do not have the starting point of a cab in the stage of offline processing. For 
enhancing the online search efficiency, we pre-compute the costs of all the potential sequences. The 
involved concept of probability summation function PE is introduced as follows. 

Definition 2. (Probability Summation Function PE) Letr = (ci, C2, . . . , cl) be a potential sequence 
with length L and d = {cQ,f) be a driving route derived from. f. The probability summation of f is 

the sum of all the dimensions in the probability sub-vector P{d) , and it is given as 

L-l 

PE{f) = p(ci) + p{^ ■ p(c2) + • ■ • + n 'p^ ■ ^(^i)- (4) 

Since the sum of all the components in the probability vector P{d) is equal to 1, the value of 
the probability summation function PE of r has the following property. 

L 
PE{f) = PE{ci,C2, . . . ,cl) = 1 -Hn^. (5) 

j=i 

The value of the function PE can be calculated recursively. Given a potential sequence fi = 
(ci, C2, . . . , Cfc) and its postfix sub-sequence r2 = (c2, C3, . . . , c^), PE{ri) can be iteratively calculated 
by 



PE{n) = P{ci) + P(ci) • PE{f2)- (6) 

According to the above definitions, we can obtain the iterative computation theorem of the 
potential sequences as follows. 

Theorem 3.1. Let f = (ci,C2, . . . ,cl) be a potential sequence with length L. The distance sub- 
vector of f is 



Dif) = I i?,„e„ (l)c„c, + i?c„C3), • • • , E ^^-1. 



j=2 



L-1 



and its probability sub-vector is 

P{f) = /P(C2),PM • P(C3), . . . , n Pi^) ■ PicL)). 

Then the PTD sub-function Fl of f is 

Fl(f) = D{f) ■ P(f). 

Given a potential sequence fi = (ci,C2,... ,Cfc) 1 < A; < A^ and its postfix sub-sequence r2 
(c2, C3, . . . , Cfc), the Fl{fi) can be iteratively calculated as 



Fl(fi) = P(C2) • Fl(f2) + Z^ei.c. • PE{r2). 

Proof. Based on the definition of the PTD sub- function Fl, we have 



(7) 



Fl{ri) = D(ri)-P(ri) 



fe-l. 



■ Oci,C2 ■ P(C2) + {Dc,.c2 + Dc.,.c,) ■ P(C2) • P(C3) + ■ ■ ■ + E Oc._i,c. ■ D P{c^) ' ^(Cfe) 

i=2 i = 2 

k k—1 / fe— 1 

^^C2,C3-P(^-P(c3) + ---+EOc._i,c, ■ n PM-^(Cfe) + Dci,C2- P(C2) + P(^-P(C3) + ---+ D P^-Pfe) 

1=3 1=2 V i=2 



: P(C2) ■ ( Plc2,C3 • P{C3) + ■ ■ ■ + E Dc,^l,c, ■ n Pic^) ' ^(Cfe) + Dci.c^ ' P(c2) + P(c2) ■ P(C3) + ■■■+]! ^fe) ' P(Cfe) 

i=3 i = 3 / V i=2 



^P(c2)-Fl(r2) + Dci,c2 ■P£:(r2) 



D 



According to the Formulas [6] and [Tj we can get the backward recursive formula for calculating 
the PTD sub-function Fl of the potential sequence r. 



The initial value: 

Vc G C, Fl{c) = 0, P-E(c) = P(c) 
Iterative formula: 


PE{C2,C3,. 


■,cl) 


F1(C1, C2, . . . , Cl) = P(C2) • F1(C2, C3, . . . , C^) +Z)ci,C2 
PE{ci,C2, ...,Cl) = P{ci) + P{ci) ■ PE{C2, ...,cl) 



The recursive formula given above shows that the Fl value of the potential sequence r = 
(ci, C2, . . . , cl) can be recursively calculated by the Fl and PE values of its postfix sub-sequence 
r* = (c2, C3, . . . , cl). In the stage of offline analysis, we only have the set of potential pick-up points 
C, but the locations of the cabs are unknown. Therefore, we can construct short postfix sequences 
and then incrementally add new pick-up points ahead of them, and this will lead to longer potential 
sequences. Actually, if we want to recommend a driving route with length L, we need to generate 
all potential sequences with length L. Once we get the current location cq of a cab online, we can 
obtain the driving routes satisfying the length constraint by inserting the current location of the 
cab Co to the head of the potential sequences with length L as the starting point. 

The PTD sub-function of the driving route d = (co,r) can be calculated using the values of 
Fl{f) and PE[r) via 

Fl{d) = P{^ ■ Fl{r) + Deo.ci • PE{r). (8) 



By combining Formula [3] with Formula [HI the PTD cost of the driving route d can be calculated 
using the formula 

F{d) = PM • Fl{r) + D,„,e, . PE{r) + D^ ■ WtJ^ .9) 

= P(ci) • Fl{r) + Dco.ci • PE{r) + Doc ■ (1 - PS(r)). ^ ' 

For the driving route d = (cq, ci, C2, . . . , c^) with length L, we can efficiently calculate the value 
of the PTD sub-function Fl of its pick-up sequence r = (ci,C2, . . . ,c/,) in advance. When the 
current location cq of the cab is received online, we can calculate the PTD value of d based on 
Formula [9l Then we can recommend the driving route satisfying the length constraint with the 
minimum PTD cost to the user. 

Using the iterative property of the PTD function, we give a recursive computational formula 
for the PTD cost as well as an incremental backward path growth method which can generate a 
potential sequence from its postfix sub-sequence. In this way, we do not have to calculate the PTD 
cost for each possible driving route from scratch, but recursively calculate it from the Fl and PE 
values of its postfix sub-sequences. Therefore, the cost of calculating the PTD of the routes can be 
reduced significantly. 

4 SEQUENCE PRUNING 

In [2], Ge et al. proposed a sequence pruning algorithm LCP based on route dominance. Let 
us briefly illustrate the principle of route dominance based pruning used in algorithm LCP. In 
Figure 2, two potential sequences with length three n = (01,02,05) and r2 = (01,04,05) have the 
same source and destination pick-up points. The associated DP vectors are defined as DP{fi) = 



Dci,c2, Pic2), Dc2,c5, P{c5)j and DP{f2) = (^Dc^^c4, Pici), Dc^^cs, P{c5)j ■ Because (Dci,c2 < -C'ci,c4)A 
Pi^ < P(q)) a (Z),,,es < ^C4,C5) A (P{^ < P(^) and {D,,^,, < D,,^,,) V (p(^ < P(^) V 



{Dc2,c5 < Dc^^cs) V (^(05) < P(o5) j are both valid, we can infer that fi dominates r2. Thus, r2 
will be pruned in advance by the algorithm LCP. 

In algorithms LCP, all possible potential sequences should be generated. Since a route being 
dominated by another route depends on the value of each dimension of the DP vector, the pruning 
effect is not high. If we can identify and remove some non-optimal potential sequences incrementally 
in the stage of sequence generation, the pruning effect would be improved. Along this line, we 
introduce the sequence pruning principle adopted in our method. 

Definition 3. (Sequence Precedence) Given two potential sequences a = {cai , ■ ■ ■ , Ca,.) and b = 
(ojj, . . . ,0{,^,) with equal length k (1 < k < N), for a starting position oq, we will get two driving 
routes di = {cQ,Cai, ■ ■ ■ ,Cai.) and d2 = {cQ^Ch^, . . . ^c^j,) with equal length k derived from a and b 
respectively. If F{di) < F{d2) holds for any possible cq, then a precedes b, and it is denoted as 
a ^b. 

If a -< 6, the potential sequence b cannot form an optimal driving route and it should be removed 
from the collection of the sequence candidates in advance. For example, as shown in Figure O 
there is another potential sequence fa = (01,03,05). Since for any possible starting position oq 
the PTD value of the driving route di = (00,01,02,05) must be smaller than that of the driving 
route ds = (00,01,03,05), ^3 is not an optimal driving route. Therefore, we can prune the potential 
sequence r^ in the stage of offline processing. 



P(C2)=03 




Figure 2: An example of the sequence dominance and precedence. 

It is easy to see that if r dominates f", then r -< r* is also valid. On the contrary, if f ^ r', r does 
not necessarily dominate r'. For example, as shown in Figure EJ even though fi does not dominate 
^^3) n ^ ^3 is still valid. It shows that sequence dominance is only a special case of sequence 
precedence. As a result, the overall pruning effect of sequence precedence should be better than 
that of route dominance. 

In order to efficiently evaluate the costs of the potential sequences, we provide a criterion of 
iterative precedence as follows. 

Definition 4. (Iterative Precedence) Let a = {ca^ , ■ ■ ■ , Ca^ ) and b = {cb^ , . . . , Cf,j, ) be two potential 
sequences derived from the set of pick-up points C . If ( Fl{a) < F\{b) I A ( 1 — PE{a) < 1 — PE{b) 1 

or ( Fl{a) < Fl{b) 1 A ( 1 — PE{a) < 1 — PE(b) 1 , then a takes iterative precedence over b, denoted 
by a (X b. 

According to the definition of iterative precedence, we propose a method to determine the prece- 
dence relationship between pairs of potential sequences in order to prune some sequence candidates 
in the process of sequence generation. Note that since the PTD function F and the iterative calcu- 
lation of the PTD sub-function Fl are both relevant to the source point of the sequence, we only 
compare the PTD costs of the potential sequences with the same source point. 

Theorem 4.1. Let l<k<N — 1, a = {cs,Ca-^ . . . ,Ca^) and b = {cs,Ch-^ . . . ,Cbi^) be two potential 
sequences with the same source point and the equal length k + 1. a' = {ci,C2, ■ ■ ■ ,Cm,Cs,Cai, ■ ■ ■ ,Ca^) 
and b' = {ci,C2, ■ ■ ■ ,Cm,Cs,Cb-^, . . . ,Ch^) are two potential sequences derived from a and b by append- 
ing the same prefix sequence f = (ci, C2, . . . , Cm) (0<m<N — k — 1, Cm&C) respectively. If 
a cub, then a' -< b' . 



Proof. Let cq be an arbitrary starting point, da = ( cq, ci,C2, . . . , Cm, Cs,C(n, . . . ,Ca ) and d^ = 
(co , ci , C2 , . . . , Cm , Cs , Cfo^ , . . . , Cbj, ) be two driving routes associated with the potential sequences a' 
and b' respectively, d = (co,ci, C2, . . . ,Cm) is a driving route with pick-up point sequence r = 

(ci,C2,... ,Cm). 

Let Do = Dco,cj + Dc,,c2 + ^C2,C3 + • • • +^c„,c, and Pq = P{ci) ■ P{c2) ■ P{cz) -...■ P{cm), then 



.^. 



F{da) =Fl{d)+Do . p^+ (Doo - Do) ■ P^- PiCs^ ■ P{Ca,) ■ PjCg,) ■... - Picg,) + Fl{a) -Pq -Pjcs ), 
Fit) =Fl{d)+Do ■P^+{D^-Do)-Po- Pics) ■ Pici,,) ■ P(cbJ • . . . • P{cb,) + Fl(6) • Po ■ P{cs). 
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P{C3) 



P(C4) 




P(cs) 



Figure 3: An example of batch sequence pruning. 



As we know, 



Then 



,-> 



P{Cs) ■ P{Ca,) ■ P{Ca,) • • • • • P{Ca,) = 1 " PE{a), 



P{Cs) ■ PK) • P{Cb2) • . . . • PK) = 1 - PE{b). 



F{da) =Fl{d)+V>o ■ Po+Po ■ (Fl(a) • P(c,) + {D^ - Do) ■ (1 - PE{a))), 



F{db) =Fl(d)+Do • Po+Po • (Pl(fe) • Pics) + Poo - Do) ■ (1 - PE{b))). 



So, 



,-:>. 



,^. 



F{da) - F{db) = Po • P{cs) ■ (Fl(a) - Fl{b)) + Pq • (Doo - I?o) • (P^(&) - P^(a))). 

Since the desired travel distance increases along with the length of suggested driving routes, 
we can get P»oo > -Do- Thus, if a ex 6, i.e., (Pl(a) < Pl(6)) A ('l - PE{a) < 1 - PE{b)] or 

Pl(a) < Pl(6)) A (l - PE{a) < 1 - PE{b)) , then P(2) < P(2)- That is to say, a' -< b' . D 



According to the feature of the precedence relationship between the potential sequences, we 
introduce the theory of batch and incremental sequence pruning as follows. 

Corollary 4.2. (Batch Pruning) Given two potential sequences a = {cs , Cai , ■ ■ ■ , Ca,.) and b = 
(cs, Cf,^ , . . . , Cbf. ) with the equal length k + 1(1 < k < N — 1) and the same source point Cg, if a (x b, 
then a <b. 



The above corollary shows that if a oc 6, the driving route (co,b) derived from the potential 

sequence b is not an optimal driving route. Thus, 5 should be pruned from the sequence candidates 
with length k + 1. 

In the batch pruning, we can compare the PTD cost among potential sequences with length 
L(2 < L < N) using the values of PI and PE calculated in the iterative process. However, 
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Figure 4: An example of incremental sequence growing and pruning. 

the batch pruning cannot be applied during the process of incremental sequence generation. For 
example, as shown in Figure 3, there are two potential sequences r = (ci, 02,04) and r' = (ci, C3, C5). 
We can generate a new potential sequence (cs, ci,C2, C4) considering r as its postfix. However, we 
cannot append C3 ahead of f, because the pick-up point C3 has existed in f. Even though r oc r*, we 
can not prune r* in advance in the process of the incremental backward path growth. For example, 
if (c2,ci, 03,05) -< (c3,ci, C2,C4), we may miss the optimal route for the improper pruning of f* in 
advance. Along this line, we proposed a new corollary suitable for pruning potential sequences 
incrementally. 

Corollary 4.3. (Incremental Pruning) Given two potential sequences a = {cg , Ca^^ , . . . , Ca,.) and 
b = {csjCbj^, . . . ,C(,^) with the equal length k + 1(2 < k < N — 1) and the same source point Cg, if 
{cai,...,Ca } = {cfej,...,Cf, } and Fl{a) < Fl{b), then all the driving routes having the postfix 
sub-sequence b cannot be an optimal driving route. 

As we know, the major obstacle why batch pruning is not suitable for incrementally pruning 
potential sequences is that we may not append the same prefix sequence for all of the potential 
sequences with the same source point and length. In Corollary 14.31 we add a constraint that 
the involved potential sequences must have the same source point and the same set of pick-up 
points. Then, it is obvious that all the involved potential sequences with the same length can 
be appended with the same possible prefix sequence. Since 1 — PE{a) = Pc^ ■ Pca • • • • • Pca ! 
1 - PE{b) = PZ-PZ^-...-P^^s.nd {ca, , . . . , c, J = {cb, , . . . , Cfe^. }, then 1 - PE{a) = 1 - PE{b). 

Thus, we can simplify the iterative condition of precedence as Fl{d) < Fl{b). 

Let us study the example shown in Figure HI There are three sequences with length 4: 
n = (c4,ci,C2,C3) , f2 = (c4,C3,C2,ci) and f3 = (c4,C2,ci,C3). For any pick-up point c G 
C — {ci, C2, C3, C4}, it can be appended ahead of the three sequences to construct three new potential 
sequences with length 5. If Fl(ri) < Fl(r2) < -^1(^3)) then ri oc r 2 oc r3, i.e., ri ^ r2 -< r^. That 
is to say, r2 and r3 can be pruned in advance. Because any possible driving routes with a postfix 
sequence of the pruned sequence are not the optimal routes, they can be removed incrementally. 
However, fi remains as a sequence candidate with length 4 and it is considered as the possible 
postfix of other longer potential sequences. 

4.1 The Analysis of Pruning Effect 

In this subsection, we analyze the pruning ratio of our incremental and batch pruning methods 
respectively. Let the total number of potential sequences be M and the number of the remaining 
sequences after pruning be M', the pruning ratio r] = (M — M')/M. 
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Theorem 4.4. For all possible potential sequences with length L (3 < L < N), the incremental 
pruning ratio is rj = 1 — 1/{L — 1)!. 

Proof. Given a set of potential pick-up points C with \C\ = N, the number of the potential se- 

/ N \ 
quences with length L {3 < L < N) is M = I I • L!. In the process of incremental pruning, 

we only consider a group of potential sequences with the same source point and the same set 
of pick-up points. Based on Corollary 14.31 the precedence relationships of these sequences are 
only related to the Fl values of them. Actually, in most cases we choose an optimal sequence 
with the minimum Fl value from all these potential sequences. Since the number of the permu- 
tation of L — 1 pick-up points except for the same source point is (L — 1)!, the number of the 

remaining sequences M' =1 j • L! / (L — 1)! = I ) ' -^- Then the pruning percentage is 

r/ = (M-M')/M = l-((^).Ly((^).L!)=l-l/(L-l)!. D 

Note that the incremental pruning method is only applied to deal with the potential sequences 
with the length L > 3. According to Theorem 14.41 the incremental pruning ratio sharply increases 
along with the increase of the length of the sequences. In order to remove more non-optimal se- 
quences, we need to use the batch pruning method on the remaining sequences after the incremental 
pruning process. As a result, the pruning ratio can be improved further. 

In the process of batch pruning, we compare the precedence relations between the remaining 
potential sequences with the same source point c £ C and the same length L. As we know, 
whether a potential sequence will be removed by the batch pruning method is related to the Fl 
and PE values of the sequences. Therefore, with the increase of the length L, the probability of 
the equivalence of the Fl and PE for any pair of sequences with the same source point will become 
lower and lower. As a result, the number of the remaining sequence candidates after incremental 
and batch pruning process is close or equal to N when the length L is close to N. 

5 THE ALGORITHM 

Based on the analysis above, we first present the offline generation algorithm of the potential 
sequence candidates and the online route query algorithm. Then, we analyze the computational 
complexity of our algorithms. 

5.1 The Offline Processing Algorithms 

The detail of our dynamic programming based algorithm BP-Growth is given in Algorithm [TJ 
It generates the potential sequence candidates in the offline stage when the position of a cab is 
not involved. In order to construct all possible potential sequence candidates incrementally and 
efflciently, a backward path growth procedure and an incremental sequence pruning process are 
employed which combines with the iterative calculation of the Fl and PE values of the potential 
sequences. 

After the sequence generation and pruning process of Algorithm [H we will obtain a set of 
sequence candidates with length from 1 to N. For the potential sequence candidates, we adopt the 
batch pruning algorithm to reduce the number of sequence candidates further. As we know, after 
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Algorithm 1 BP-Growth 



Input: A set of potential pick-up points C, the probability set P for all pick-up points, the pairwise driving distance matrix 

D of pick-up points. 
Output: A set of the potential sequence candidates R with length L from 1 to A^ 



1; 


R^^0; 




2; 
3: 


for each 


Ci e C do 

c,>; Fl{r) ^ 0; PE{r) <- P{c,y, R^ <- fl^ U {r} 


4: 


end for 




5: 
6; 


for A = 
for e 


2 to A do 

-0; 


7; 


achr e R^-^ do 


8; 


for each Ci G {C - Cp) do 


9; 




//Potential Sequence Generation 


10 




P<- (ci,?=}; c<- s(r); 


11 




Flip) ^ Fl(r) ■ P(c) + D^.^^PEif); 


12 




PE{p)^PE{T)-P{ci) + P{ci); 


13 




//Incremental Sequence Pruning 

r| = {gig 6 R^, s(q) = sip), C^ = Cjr}; 


14 




15 




if ^ = then 


16 




^ ^ ^ U M; 


17 
18 




else 

if Vg G fi|(Fl(p) = Fl{q}) then 


19 




R^ ^R^U {p}; 


20 




end if 


21 
22 




else 

if Vg e RiiFl{p} < Fl{q)) then 


23 




R^ ^ (r^ - fl|) U {p}; 


24 




end if 


25 




end if 


26 


end for 


27 


end for 


28 


end for 




29 


return 


1^= U R^: 



the sequence candidates are produced offline, the Fl and PE values of these sequences have also 
been calculated iteratively. Therefore, we can directly compare the Fl and PE values between 
the potential sequence candidates to prune the non-optimal ones during the batch pruning process 
which is described in Algorithm [2l 

5.2 The Online Search Algorithm 

Our method is able to provide real-time driving route recommendation service for the unloaded 
cabs at various positions. When a cab at the position cq requests the recommendation service, an 
online search algorithm is adopted to find an optimal driving route from the remaining potential 
sequences generated in the offline stage. Algorithm [3] shows the online search procedure of optimal 
route in detail. 

In Algorithm [3l for each L (Lmin <L< Lmax), we first generate the potential driving routes 

D with length L by connecting cq with each potential sequence candidate in the set R . Then 
we calculate the PTD value of each potential driving route with Formula [H Finally, the driving 
routes with the minimal PTD value are selected and returned to the users. 
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Algorithm 2 BatchPruning 



Input: A sot of the potential sequences R with length L. 

Output: A set of the remaining sequence candidates R' with length L. 



1 

2 


for each c 6 C do 


3 


end for 

for each r d R do 


4 


5 


c_^ s(r)^ 


6 


R^ ^R^yj{l^}; 


7 


for each ~^ d~R^ M^ ^ 1^ do 


8 

9 
10 


if (foe r then 

: break; 


11 


: else 


12 

13 

14 


: if r oc g then 
: end if 


15 


: end if 


16 


: end for 


17 


: end for 


18 


: return i?'^ = U i?f ; 



Algorithm 3 RouteOnline 



Input: : A set of the sequence candidates it, the current position of a cab cq and the minimum length Lmin and maximum 

length -Lmax of the suggested driving route (1 < imin !^ irnax < N). 
Output: : A set of the optimal driving routes Dmin- 

1 ^""""^ 

2 

3 

4 
5 



i'min ^ 0; -F'min <" +00; 
for L = Lniin to Lmax do 

for each rS R do 
c = s(r^; 
d= (co,r); 

F(d) = Fl(r) ■ (1 - P{c)) + Dco,c ■ PE{r) + D^o ■ (I - PE(r)); 
if Dmin = V F{d) = Fmin then 



9 
10 

11 
12 
13 
14 
15 
16 



i^min^OmmUld}; 

else 

if F{d) < Fmin then 



D~^ ^ {1}; Fmin ^ F(d); 
end if 
end if 
end for 
end for 



5.3 Analysis of Computational Complexity 

In this subsection, we analyze the computational complexities of the offline sequence generation 
and the online search algorithm respectively. 

5.3.1 Offline processing algorithms 

We first analyze the computational complexity of our offline algorithm BP-Growth. The key step 
in the algorithm BP-Growth is the incremental process of the sequence growing and pruning. As 
we know, in order to generate the potential sequences with length L, we append each pick-up 
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point c ahead of the sequence candidates with length L — 1 that do not contain c. When the 
length of the potential sequence L = 1, all pick-up points will be enumerated, so the computational 
complexity is N. When L = 2, as we know, the number of sequence candidates with length 1 
is A^. Since each pick-up point only appears once in a potential sequence, we still have A^ — 
1 possible pick-up points for each sequence candidate. Therefore, the loop execution times of 
the key step for L = 2 is N{N — 1). When we generate the potential sequences with length 
L > 2, the number of the remaining sequence candidates with length L — 1 after the incremental 

pruning process is I I • (L — 1)! / (L — 2)!. Nevertheless, we still have N — L + I pick-up 



^ L-1 ^ 
points to be appended to the heads of these sequence candidates, and the computational times is 

^ ) • (i - 1)! /(i - 2)! ) ■ {N - L + 1) = ( ^ ] ■ L ■ {L - I). It can be seen that the 

computational complexity of the process with length L = 1 is 0{N). It increases gradually and 

reaches the peak with L = ^72 . After that, the computational complexities decrease and drop 

to 0{N'^) with L = N. 

We then present the computational complexity analysis of our algorithm BP-Growth for gen- 
erating all possible sequences with length from 1 to A^. 

Given a set of pick-up points C with \C\ = N, as we know, the total execution times for 
generating all the potential sequences with length L < N is 

N 
L 



N 

f{N) = N + N-{N-l)+J2{L-l)-L 

L=3 
f{N) can be transformed to 



f{N) ^N + 2[ ^j+2-3(^^j+... + iN-2)-{N-l)(^^^_^j+{N-l).N(^^ 

I N \ f N \ 

Since L • I 1 = (A^ — L + 1) I I , then f{N) can also be described by the following 



L ' M L-1 



equation 



fiN)=N + {N-l)(^^^^+2{N-2)(^ ^)+3(Af-3)(^^)+... + (7V-l)(^^^^) 
If we add above two equations, we will obtain the following deduction. 

2fiN)^2N + iN-l)(^^^+2iN-l)(^^)+3iN-l)(^^^+... + NiN-l)(^^ ) 



= 2iV + iV(iV-l) -2^-1 

Then f{N) = N + N{N - 1) • 2^-2. Thus, 0{f{N)) = 0(A^2 . 2^^ 

In summary, the computational complexity of incremental generation of the potential sequences 
with all possible length L (1 < L < A^) via BP-Growth is ©(A^^ • 2^). 

5.3.2 Online search algorithm 

The computational complexity of our online search algorithm with Lmin = -^max = L directly 
depends on the number of the remaining sequence candidates in R . For the set of the sequence 
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R' 



^-■ 



candidates R produced by algorithm BP-Growth with incremental pruning, 

Therefore, when we set L = 1 or A^, the computational complexity of our online search algorithm 
RouteOnline is 0{N). When L = ^72 , the computational complexity is the highest which is 

close to O [N ■ 2 ). If we use both the incremental and the batch pruning processes, the search 
efficiency can be significantly enhanced. However, it is hard to obtain the precise analysis of its 
computational complexity. As for the search time of route query with a constraint of minimum 
length Lmin and maximum length Lmax, it is just the sum of the search time in each set of sequence 

candidates R (Lmin < L < Ly, 



^max y 



6 EXPERIMENTAL EVALUATIONS 

In this section, we evaluate the performance of our method by comparing its pruning effect, Memory 
consumption and online search time with those of other state-of-the-art methods. All acronyms 
of evaluated algorithms are given in Table [2j LCP and SkyRoute are two route dominance based 
pruning algorithms proposed in [2]. In particular, SkyRoute is an online pruning algorithm, where 
two skyline computing methods BNL and D&C can be applied to prune potential sequences [6]. 
Its corresponding online search methods are denoted by SR(BNL)S and SR(D&:C)S, respectively. 
All the algorithms were implemented in Visual C-|— |- 6.0. The experiments were conducted on a 
PC with a Intel Pentium Dual E2180 processor and 4GB RAM. 

6.1 Data Sets 

The adopted experimental data sets are divided into two categories: real- world data and synthetic 
data. 

Real- World Data. In the experiments, we adopt real- world cab mobility traces used in [2], 
which are provided by Exploratorium - the museum of science, art and human perception. It 
contains GPS location traces of 514 taxis collected around 30 days in the San Francisco Bay Area. 
We extract 21,980 and 38,280 historical pick-up locations of all the taxi drivers on two time periods: 
2PM-3PM and 6PM- 7PM. In total, we obtain 10 and 25 clusters as well as their probabilities on 
these two real data sets using the same method adopted in [2]. 

Synthetic Data. We also generate four synthetic data sets. Specifically, we randomly generate 
potential pick-up points and their pick-up probabilities within a special area by a standard uniform 
distribution. In total, we have four synthetic data sets with 10, 15, 20 and 25 pick-up points 
respectively. The Euclidean distance instead of the driving distance is adopted to measure the 
distances between pairs of pick-up points. 

For both real-world and synthetic data, we randomly generate the positions of the target cab 
for recommendation. 

6.2 The Overall Comparison of Pruning Effect 

As we know, algorithms BFS and LCP need to enumerate all possible sequences of a certain length 

L. For a set of potential pick-up points C with \C\ = N, the number of all possible sequences 

/ N \ 
with length L is I \ ■ LI. And the computational complexity is 0{N\) when L = N. When 
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Table 2: Some acronyms used in experimental analysis. 



LCP 

Sky Route 

SR(BNL) 

SR(D&C) 

IP 

IBP 



Sequence pruning via route dominance 

Sequence pruning via skyline query 

SkyRoute with skyline computing method BNL 

SkyRoute with skyline computing method D&C 

Generation of potential sequence candidates 

via BP-Growth with incremental pruning 

Generation of potential sequence candidates 

via BP-Growth with Incremental and Batch Pruning 



BFS 

LCPS 

SR(BNL)S 

SR(D&C)S 

IPS 

IBPS 



Brute-force search 

Search via LCP 

Skyline search via the algorithm SkyRoute -|- BNL 

Skyline search via the algorithm SkyRoute -|- D&C 

Search on the potential sequences generated by IP 

Search on the potential sequences generated by IBP 



the number of pick-up points N or the length of suggested route L is a little larger (e.g., A'^ = 20 
and L = 6), both BFS and LCPS cannot finish the enumeration process in a rather long time. 
Therefore, when we analyze the pruning ratio varying with the length of suggested driving routes 
on the same set of pick-up points, we make the number of pick-up points small (e.g., |C| = 10) in 
order to show the overall comparison of all concerned algorithms. When we analyze the pruning 
ratio varying with the number of pick-up points on the fixed length of driving routes, we also make 
the length of the routes small (e.g., L = 3 and L = 5). For the algorithms proposed in this paper, 
since the sequences are pruned incrementally, both the time and space complexity are better than 
that of BFS and LCP. Thus, we can use the synthetic data set with \C\ = 25 to analyze the pruning 
effect of the proposed incremental algorithm BP-Growth in detail. 

6.2.1 The Pruning Ratio Varying vi^ith the Length of Potential Sequence 

— S Q S Q 




4 5 6 7 8 9 10 

Length of Potential Sequences (L) 

(a) Real- World Data 




Length of Potential Sequences (L) 

(b) Synthetic Data 



Figure 5: The pruning ratio of different algorithms w.r.t. the length of potential sequence on the 
data sets with \C\ = 10. 

Figure [5] shows the varying of pruning ratio of several algorithms with the length of potential 
sequence on both real-world and synthetic data with \C\ = 10. Algorithms LCP, SkyRoute, IP and 
IBP are all able to prune some non-optimal sequences derived from C. When the length L = 2 or 
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L = 3, the proposed algorithms IBP and IP perform worse than the algorithms SkyRoute and IP. 
However, as the length of the potential sequence L increases, the pruning ratios of our algorithms 
IP and IBP are both significantly improved. It can be observed that IBP outperforms SkyRoute 
and IP outperforms LCP on both real and synthetic data when L > 5. Furthermore, the pruning 
ratios of our algorithms are gradually improved and close to 1 when the length of suggested driving 
route L > 6. In contrast, the change of the pruning ratios of LCP shows a trend of parabola. When 
L > 5, the pruning ratios of LCP and SkyRoute both gradually drop. When the length is equal to 
the number of pick-up points (i.e., L = |C|), the pruning ratios of them decrease to 0. 

To verify that our method can process the potential sequences derived from a larger number of 
pick-up points, we test the pruning ratios of IP and IBP on the synthetic data set with \C\ = 25. 
We find that the trends of the pruning ratios of our algorithms on different data sets are consistent. 
Since LCP and SkyRoute are only able to deal with the driving routes with L < 5 on the data set 
with \C\ = 25, we can not obtain the whole result of them on this bigger data set. 

Let us analyze the reason why the pruning ratios of algorithms IP and IBP are relatively high. 
First, for the incremental pruning algorithm IP, its pruning ratio is equal to 1 — 1/(L — 1)! which 
dramatically increases along with the increase of the length of potential sequence. When L = 6, 
the pruning percentage has reached 99.2%. 
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Figure 6: The number of remaining sequence candidates after using the pruning algorithms IP and 
IBP respectively on the synthetic data set with |C| = 10. 

In addition, the algorithm IBP also adopts a batch pruning process to remove some non-optimal 
potential sequences. As shown in Figures [6] and [71 the number of remaining sequences of IBP 
can be several orders of magnitude smaller than that of IP, especially when \C\ is large, which 
demonstrates the effectiveness of batch pruning. The overall trend of the number of the remaining 
sequence candidates presents a Gaussian distribution. It increases first with the increase of the 
length L, and then decreases when L > \C\/2. Moreover, it is close to the number of pick-up points 
\C\ when L — ?> |C|, which is completely consistent with the analysis of Section 3. 

In terms of LCP, whether a route is dominated by another route depends on the value of each 
dimension of the vector DP. When the value of L is small, the pruning ratio has some growth with 
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Figure 7: The number of remaining sequence candidates after using the pruning algorithms IP and 
IBP respectively on the synthetic data set with \C\ = 25. 

the increase of the length. However, when the sequence length becomes larger, the number of the 
dimensions of vector DP increases and the probability of domination in each dimension between 
DP vectors becomes lower, which leads to the gradual decline of the pruning ratio. When L = \C\, 
since all pick-up points are involved, it is impossible to make the probability of each pick-up point 
in a sequence larger than that of another. Thus, the pruning percentage is in this case. As for 
SkyRoute, since the principle of it is similar as that of LCP, the overall trends of them are almost 
the same. 

6.2.2 The Pruning Ratio Varying with the Number of Pick-up Points 




«.4t^ 
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(b) L = 5 



Figure 8: The pruning ratio of different algorithms varying with the number of pick-up points on 
the synthetic data with \C\ = 25. 

Figure [8] shows the varying of pruning ratio with the number of pick-up points on synthetic 
data with \C\ = 25 when L = 3 and L = 5 respectively. It can be observed that the pruning ratios 
of LCP and IBP increase with the increase of the number of pick-up points, and the pruning ratio 
of IP is constant. When L = 3, the pruning ratio of IP is equal to 0.5 and the pruning ratio of IBP 
gradually increases with the number of pick-up points. However, our algorithms do not perform 
better than algorithms LCP and SkyRoute. When L = 5, the pruning ratio of IP is more than 0.95 
and the pruning ratio of IBP is close to 1 which are much higher than those of LCP and SkyRoute 
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respectively. 



6.3 Analysis of the Memory Consumption 
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Figure 9: The internal memory consumption varying with the length of suggested driving routes 
on the synthetic data sets with \C\ = 10 and |C| = 20. 

Figure [9] shows the varying of consumed internal memory storage with the length of potential 
sequence on synthetic data sets with \C\ = 10 and \C\ =20 respectively. When the length L < 4, 
the memory cost of IP is a little higher than those of the other three algorithms due to the storage of 
some iterative calculation values, such as Fl and PE. However, as the length of potential sequence 
L increases, the memory consumption of LCP, SR(BNL), and SR(D&:C) dramatically increase. Our 
algorithm IP presents an overall trend of parabola, which reaches the peak with TOOK and 600M 
RAM on these two data sets respectively. Among algorithms LCP, SR(BNL) and SR(D&:C), the 
space cost of SR(D&;C) increases fastestly due to its recursive calculation process. In summary, the 
trends of the space cost of our algorithm IP on different data sets are consistent, and it is almost 
the same as the trend of the remaining sequence candidates. 

Let us analyze the reason why the space cost of algorithm IP is relatively low. As we know, 
the generated sequence candidates and the associated values of PE and Fl are stored in RAM 
only during the incremental process of generating the potential sequences from the length L to 
L + 1. Thus, the space cost is determined by the number of the remaining sequence candidates 
with the length L and L + 1 generated by IP. The number of the enumerated sequences dramatically 
increases with the increase of the number of pick-up points A^ and the length of potential sequences 
L. Therefore, when the size of pick-up points A^ becomes larger, the numbers of the remaining 
sequence candidates dramatically increase. Since the algorithm IP uses an incremental method to 
generate and reduce the potential sequences, its space performances are much better than those 
of BFS and LCP. Nevertheless, when the number of pick-up points is larger (e.g., \C\ = 25), the 
generation process of IP can not be performed in the internal memory of a PC with 4G RAM. In this 
case, we have to adopt external memory storage technology to generate the potential sequences. For 
algorithms BFS, LCP and Sky Route, it is necessary to enumerate all possible sequences of a certain 
length, so the internal memory consumption is huge. As shown in Figure 9(a) they can only deal 
with the potential sequences with length L < 5 on the data set with \C\ = 10. It can be observed 
that the memory consumption of our algorithm is really much lower than those of other methods. 
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Table 3: A comparison of search time (millisecond) on the real- world data set (2-3PM). 





L = 2 


L = 3 


L = 4 


L = 5 


BFS 


0.0077 


0.0286 


0.1980 


1.4341 


LCPS 


0.0073 


0.0157 


0.0371 


0.1175 


SR(D&C)S 


10.3269 


41.6794 


182.0520 


1520.3100 


SR(BNL)S 


1.9306 


24.3543 


139.0600 


2333.1200 


IPS 


0.0070 


0.0110 


0.0165 


0.0246 


IBPS 


0.0068 


0.0069 


0.0076 


0.0085 



Table 4: A comparison of search time (millisecond) on the synthetic data set with \C\ = 15 





L = 2 


L = 3 


L = 4 


L = 5 


BFS 


0.0125 


0.0925 


1.3028 


17.9584 


LCPS 


0.0120 


0.0458 


0.3002 


1.9866 


SR(D&C)S 


24.7075 


154.6790 


1962.8100 


31210.4000 


SR(BNL)S 


3.3707 


109.5560 


3612.4100 


210161.0000 


IPS 


0.0089 


0.0556 


0.2317 


0.7322 


IBPS 


0.0086 


0.0095 


0.0107 


0.0119 



6.4 The Comparison of Online Search Time 

In this subsection, we compare the efficiency of various online route search algorithms. Note that 
all the results of search time come from the average values of 10 running cases. Tables [3l [H [5] and [6] 
show the online search time consumed by algorithms BFS, LCPS, SR(D&:C)S, SR(BNL)S, IPS and 
IBPS on both real- world and synthetic data sets with various numbers of pick-up points and lengths 
of suggested driving routes. It can be observed that the search time consumed by our algorithm 
IBPS is the least. The online route search of IPS is a little slower than that of IBPS. However, 
both of them always take a better performance over the other four algorithms. For IPS and LCPS, 
as we know, when L is small (e.g., L = 3), the pruning ratio of IP is a little lower than that of LCP. 
However, our algorithm IP outperforms LCP benefited from the recursive computation of the PTD 
cost. The search time of the two skyline methods SR(D&:C)S and SR(BNL)S is much longer. The 
major reason is that skyline query is processed online which needs a rather long time. Therefore, 
this type of method is not suitable for recommending driving routes for a single cab. Actually, it 
performs better in providing multiple optimal driving routes for different cabs at the same place 
and time. 

Figure [10] shows the curves of the search time varying with the length of suggested routes on 
the synthetic data set with \C\ = 10. A comparison of the search time for a certain length of 
driving routes L of all the five algorithms above is given in Figure 10(a) Obviously, the search 



time of SR(D&C)S and SR(BNL)S dramatically increases along with the increase of the length of 
the suggested route. Since the number of remaining sequence candidates is very small, the search 
time consumed by our algorithms IBPS and IPS is always lower than those of other four algorithms, 
and it becomes more and more obvious as the length of suggested driving routes increases. 

In addition, we add some significant tests by t-test when the length is small. Tables [7] shows the 
average search time for LCPS, IPS and IBPS. The table also shows p values from a paired t-test 
for IPS and IBPS compared to LCPS. It can be observed that the search time consumed by our 
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Table 5: A comparison of search time (millisecond) on the synthetic data set with \C\ 



20. 





L = 2 


L = 3 


L = 4 


L = 5 


BFS 


0.0189 


0.2244 


4.6117 


89.7804 


LCPS 


0.0185 


0.1092 


0.8328 


7.9164 


SR(D&C)S 


40.1345 


617.7660 


11858.1000 


301341.0000 


SR(BNL)S 


11.0259 


769.1580 


49492.5000 


5453600.0000 


IPS 


0.0149 


0.0584 


0.3480 


1.5331 


IBPS 


0.0120 


0.0141 


0.0172 


0.0202 



Table 6: A comparison of search time (millisecond) on the synthetic data set with \C\ = 25 





L = 2 


L = 3 


L = 4 


L = 5 


BFS 


0.0262 


0.4479 


11.9400 


305.5830 


LCPS 


0.0260 


0.1919 


2.0199 


22.6318 


SR(D&C)S 


52.1815 


1362.7300 


36792.0000 


1093460.0000 


SR(BNL)S 


17.7717 


1559.3200 


144124.0000 


24283100.0000 


IPS 


0.0189 


0.1383 


0.8135 


5.1262 


IBPS 


0.0148 


0.0182 


0.0242 


0.0446 
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Figure 10: The search time varying with the length of suggested driving routes on the synthetic 
data set with \C\ = 10: (a) an overall comparison of various algorithms; (b) the comparison between 
IPS and IBPS. 

algorithms is always significantly lower than that of LCPS. 

In order to make the trend of search time clearer, the curves of our algorithms with Lmin = 
Lmax = L and Lmin = 1) Lmax = L on the synthetic data set with \C\ = 10 are shown in figure 10(b) 



respectively. We can see that the search time of our algorithm IPS for a certain length of driving 
routes L also shows a parabola trend, the same as the trend of the remaining sequence candidates. 
After the batch pruning, the number of remaining sequences becomes so small and the search time 
of IBPS is almost constant. The search time of our algorithms IPS and IBPS with Lmin = 1 and 
Lmax = L gradually increases with the increase of the maximal route length L. When L = 10, the 
search time of our algorithms for all possible driving route with 1 < L < 10 is still less than 0.14ms. 
In summary, our online search algorithm has much lower search time compared to other existing 
methods. Moreover, it has a more flexible length constraint. 
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Table 7: The paired t-test compared to LCPS 






L = 2 


L = 3 


L = 4 


L = 5 


mean(LCPS) 


0.0079 


0.0191 


0.0717 


0.3381 


mean(IPS) 


0.0069 


0.0101 


0.0161 


0.0225 


mean(IBPS) 


0.0060 


0.0065 


0.0076 


0.0081 


t-test(IPS, LCPS) 


p = 0.001 
p < 0.01 


p = 0.000 
p<0.01 


p = 0.00 
p<0.01 


p = 0.000 
p<0.01 


t-test(IPS, LCPS) 


p = 0.000 
p < 0.01 


p = 0.000 
jXO.Ol 


p = 0.000 
p<0.01 


p = 0.000 
p<0.01 



7 Discussion 

In this section, we discuss some extensions of our method. 

7.1 Multiple Evaluation Functions 

As we know, the PTD function is a measure for evaluating the cost of a driving route. To meet 
different business requirements, we can adopt other evaluation functions. Two examples are given 
as follows. 

The Potential Travel Time (PTT) [26]. Since the driving time between two pick-up points 
usually depends on the traffic flow on the road, the distance does not always present the cost of 
a travel route properly. Thus, it is also valuable to recommend a route with least driving time. 
Let us give the definition of PTT. Assume that Tc-_-^^Ci is the driving time from Cj_i to Cj during 
a certain period of time. In Formula [U if we replace the distance Dci_i,Ci with travel time Tc^^^^^a 
and Doo with Too, we can get a function of potential travel time 



FT{d) = n„,,,-P{ci) + {T,,„,,+' 



,_)-P{ci)-P{c2) + --- + T^-YlP{ci), 



(10) 



where Too denotes the desired maximum cruising time for a driver to pick up new passengers. 

The Potential Travel and Waiting Time (PTW). In real life, the taxi drivers usually get 
passengers through two ways: cruising on a road and waiting in a place [H S]. Assume that a cab 
travels along a driving route d = (co,ci,C2, . . . ,cl) {1 < L < N), it has not gotten a passenger 
when arriving the last pick-up point cl and waits at cl- Let the waiting time be a fixed value 
Tw and the probability that it successfully gets a passenger at cl during the waiting time Tyy be 

L 



Pw{cl), tlie cruising time vector of d is T{d) = 
probability vector is P{d) = ( P(ci),P(ci) • P(c2) 



J-co.cn [J^ci 



T, 



.), 



i=l 



and its 



L-l 



, n -P(ci) ■ P{cl) )■ Then the time cost of 
successfully picking up a passenger by cruising is Fc{d) = T{d) ■ P{d). The time cost of picking 



up a passenger by waiting at the last point cl is Fw{d) = (Y^ Tci_^^Ci + Tw) " 11 -f (cj) ■ Pw{cl)- 

i=l ' i=l 

The time cost when a driver does not get passengers after leaving the last point cl can be set to 

L 

Foo = Too ■ n Pi^i) ' Pwi^L)- Then, the PTW of route d can be given as 
j=i 
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Fcwid) = Fc{d) + Fw{d) + F^ 



(11) 




(a) (b) 

Figure 11: The detected potential pick-up points and cq on real- world data set of 6- 7PM. 

With various evaluation functions, we can easily recommend different types of optimal driving 
routes to the drivers. For example, in Figure [TTl ten pick-up points Cj(l < i < 10) revealed on 
the real-world data set of 6PM- 7PM and the current position cq of an empty cab are labeled on 
the map. When we set L = 3, the optimal driving routes detected by PTD, PTT and PTW are 
CO ^ CA ^ C5 ^ C7, CO ^ C5 ^ C7 ^ C4 and CO ^ C5 ^ C4 ^ C7, respectively. For 
L = 5, the optimal driving routes evaluated by the PTD is CO ^ C9 ^ Ci ^ C5 ^ C7 ^ C8, 



which is labeled in Figure 11(a) while the optimal driving routes evaluated by the PTT and PTW 
functions are the same: CO -^ C5 -^ C7 -^ CA — )• CIO -^ C8. It is observed that the optimal 
drive routes are not always the same through different evaluation functions. Therefore, they can 
be applied to different applications. 



7.2 Recommendation with Destination Constraint 

Actually, our method can deal with the MSR problem with the destination point constraint. For 
example, if a driver wants to travel to a specified destination point, we can generate all possible 
potential sequences with the same source and destination points using our proposed algorithm BP- 
Growth. The slight difference is that we only perform the cost comparison and pruning among 
the potential sequences with the same source and destination points. Then we can recommend 
the optimal driving route satisfying the destination constraint to the driver online. Moreover, if 
the driver wants to wait for passengers at the destination point, we can consider the destination 
point as a temporal parking place and perform recommendation using the PTW measure presented 
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above. For example, as shown in Figure 11(b), an optimal driving route revealed by the PTW 



function is CO — )■ C5 — ?• C7 -^ CA — > CIO — ?• CI when we set CI as the destination, Lmm = 3 and 

7.3 Load Balance for Parallel Recommendations 

We briefly discuss how to make the recommendation be suitable for many cabs in the same area at 
the same time. For the generalized MSR problem, since both of the proposed algorithms IP and 
IBP deal with the potential sequences with the same source point, we can obtain the optimal driving 
routes starting from each pick-up point. Thus, we can get N optimal driving routes with different 
source points. To perform the recommendation for multiple empty cabs simultaneously, we can 
adopt the load balancing techniques used in [2]. The round-robin strategy maintains the number 
of the multiple empty cabs requesting the service, chooses one from the N optimal driving routes 
by the system in a circular manner [24^ [25] for the kt^ request. For example, we can recommend 
the No.l route with source point c\ to the empty cab that first request the service, recommend 
the No. 2 with route source point C2 to the second empty cab, etc. And recommend the No.l route 
again for the {N + l)th request. 

8 Conclusion 

This paper presents a dynamic programming based method to solve the problem of mobile sequential 
recommendation. The proposed method utilizes the iterative nature of the cost function and 
multiple pruning policies which greatly improve the pruning effect. The overall time complexity for 
handling mobile sequential recommendation problem without length constraint has been reduced 
from 0{N\) to 0{N'^ ■ 2 ). Experimental results show that the pruning effect and the online search 
time are better than those of other existing methods. In the future, it will be interesting to use 
parallel algorithms for sequence generation and recommendation. 
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